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1.  Summary. 

The  use  of  the  mean  of  a  symmetrically  trimmed  saniple  (the  trimmed  mean) 
as  an  indicator  of  location  and  the  use  of  the  total  sum  of  squares  of  deviations 
of  the  same  trimmed  sample  (the  TSSD)  as  an  indicator  of  the  variahility  of  the 
trimmed  mean  are  explored.  The  Increase  in  variance  (of  the  trimmed  mean  as 
compared  with  the  untrimmed  mean)  when  trimming  samples  from  an  exactly  normal 
distribution  is  fo\md  to  be  less  than  3^,  6^,  and  ikio,  respectively,  when 
a  total  of  l/lO,  2/10,  3/l0,  or  k/l.0  of  the  sample  is  trimmed  away.  (Trimming 
vrill  decrease  the  variance  when  the  samples  come  from  a  long-tailed  distribution.  ) 
The  loss  of  normal-theorj'-  efficiency  is  given  for  all  symmetric  trimmings  of 
samples  of  size  <  20.  The  appropriate  divisor,  by  which  the  trimmed  sum  of 
squares  of  deviations  is  to  be  divided  to  obtain  an  estimate  of  the  variance  of 
the  trimmed  mean, is  tabled  for  the  same  range. 

The  effect  on  this  divisor  of  sampling  from  rectangular  rather  than  normal 
populations  is  foxmd  to  be  srnall,  but  noticeable.  The  empirical  behavior  of 
the  reciprocal  of  the  divisor  is  found  to  be  sinple,  and  a  theoretical  e3q)lan- 
ation  for  this  is  provided. 

Further  studies  in  this  area  are  in  progress. 

2.  The  problem. 

While  the  sample  mean  and  sample  variance  are  sufficient  statistics  when 
the  sample  is  specified  to  come  from  a  precisely  normally  distributed  population, 
so  that  no  statistic  can  then  be  a  better  estimate  of  location  than  the  sample 
mean,  and  no  statistic  can  be  a  better  basis  for  estimating  the  variance  of  the 
sample  mean  than  the  sample  variance  (or  sample  sum  of  squares  of  deviations), 
these  optimum  properties  fall  miserably  for  samples  from  non-normal  distributions 
( aven  when  these  non— normal  dlstidbutions  are  symmetrical).  Thus  it  is  of  interest 
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to  consider  other  indicators  of  location,  and  other  bases  for  estimating  the 
variance  of  these  indicators.  We  must  decide  how  to  work  numerically  with  each 
particular  ii^iicator  and  with  each  partic\ilar  basis  for  estimating  its  variance, 
^Hie  first  lequirement  that  our  procedures  must  satisfy  is  that  they  be  appro¬ 
priate  when  the  parent  distribution  is  precisely  normal,  (ihough  we  may  rarely 
find  samples  from  normal  distributions  in  practice,  none  of  us  want  to  give 
the  possibility  that  a  few  of  the  parent  populations  ve  face  may  be  almost  exactly 
normal,  and  that  others  may  be  nearly  normal. ) 

In  large  samples  the  trimaed  mean  and  the  trimmed  standard  deviation  (the 
mean  and  sarnie  standard  deviation  of  those  observed  values  remaining  Out  of  a 
sample  of  n  when  the  g  highest  and  g  lowest  values  have  been  deleted)  have 
been  shown  to  have  quite  satisfactory  properties  [5]»  This  report  opens  an 
investigation  into  the  properties  of  both  these  and  related  statistics  in  small 
and  moderate  saagjles, 

3.  Results. 

The  main  qviantities  studied  are: 

(1)  the  variance  of  the  trimmed  mean. 

(2)  the  normal  theory  efficiency  for  location  of  the  trimmed  mean  (as 
referred  to  the  untrlmmed  mean^  i,  e. ,  bhe  ratio  of  the  variance  of , the  untrlmmed 
mean  to  the  variance  of  the  trimmed  mean), 

(3)  "the  average  value  of  the  trimmed  sum  of  squares  of  deviations, 

(4)  the  ratio  of  (3)  to  (l),  which  is  the  factor  by  which  an  observed 
trimmed  sum  of  squares  of  deviations  should  be  divided  in  order  to  obtain  an 
unbiased  estimate  of  the  variance  of  the  corresponding  trimmed  mean.  Ihese 
valites  are  given  for  n  <  20, 
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Such  numerical  jrestilts  provide  (l)  a  method  for  calciilatlng  an  unbiased 
estimate  of  the  variance  of  a  trimmed  mean,  and  (ll)  e.n  indication  of  the  price 
that  must  he  paid  for  trlmBnlng  when  the  parent  population  is  indeed  normal.  They 
do  not  provide  solutions  for  the  following  problems; 

(1)  If  the  extent  of  trimming  is  guided  by  the  apparent  quality  of  the 
estimates  provided  by  differently  trimmed  means,  how  much  will  be  the  downward  bias 
of  the  estimated  variance  of  that  trimmed  mesin  which  appears  to  have  the  least 
variance?  (This  bias  Is  due  to  selection  and  arises  when  the  variances  of  the 
various  trimmed  means  are  estimated  as  Indicated  below.  ) 

(2)  How  much  does  the  distribution  of  the  ratio  of  trimmed  mean  to  the 
square  root  of  Its  estimated  variance  (based  i^on  the  trimmed  siaa  of  squares  of 
deviations)  differ  from  a  suitabl.e  multiple  of  a  suitable  Instance  of  Student's 
t?  What  multiple  and  what  degrees  of  freedom  are  suitable? 

( 5)  How  variable  is  the  trimmed  sum  of  squares  of  deviations  as  a  basis 
for  estimating  scale? 

{h)  How  do  trimmed  means  and  trimmed  sums  of  squares  of  deviations  oehave 
for  parent  distributions  that  are  non-normal  but  symmetric? 

It  is  hoped  to  provide  at  least  partial  answers  to  these  questions  in  later 
reports  of  this  series. 

Ihe  most  directly  relevant  and  useful  results  are  collected  in  Table  1, 
Figure  1,  and  Ihble  2.  Table  1  shows  the  loss  In  normal  theory  efficiency  when 
2,  3,  , , ,  .  observed  values  are  deleted  from  each  end  of  a  san^jle.  Figure  1 
shows  similar  information  in  terms  of  the  modified  fraction  of  the  observations 
deleted  from  each  end.  Table  2  contains  the  divisors  which  convert  trimmed 
aiimg  of  squares  of  deviations  Into  unbiased  estimates  of  variances  of  trimmed 
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If.  Exangile. 

If  we  are  deailiiag  with  san5)les  of  11  and  choose  to  routinely  trim  2 
observations  off  each  end  of  each  sample,  the  loss  of  normal  efficiency  can  he 
seen  from  Ihhle  1  to  he  11. 7^^.  If  the  population  is  exactly  nonnal,  the  trimmed 
mean  will  have  a  standard  deviation  some  G’jo  greater  than  the  untrimmed  mean. 

(And  if  the  population  has  rather  long  tails,  the  trinmed  mean  will  have  a  much 
smaller  standard  deviation  than  the  untrimmed  mean.  ) 

If  we  have  the  following  11  observations:  -5,  10,  15>  11>  12,  17,  -1,  S, 
15,  10,  18  and  proceed  by  trimming  two  from  each  end,  we  have  to  find  the  mean 
and  sum  of  squares  of  deviations  of  the  remaining  7  observations.  Hence 
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15 

225 
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121 
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144 
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64 

15 

169 

10 

100 

79 

925 

uj  70 

y  ss  S5  11,28  trimmed  mean 
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T  =  925  -  s  51.45  =  trimmed  sum  of  squares  of  deviations. 

From  Table  2  we  find  that  51.45  should  be  divided  by  18.955  to  obtain  an 


unbiased  estimate  of  the  variance  of  y  .  Ihe  standard  error  of  y  is  thus 
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I. 

GENEJ^AL  CONSIDERATIONS 
We  shall  use  the  following  notations: 

yi  <  ya  -  •••  -  ^n 

are  the  ordered  values  in  a  san^jle; 

ave  {  ) 

indicates  the  average  value,  or  expectation,  of  the  expression  in  (  ); 

var  {  ) 

indicates  the  variance  of  the  quantity  following,  as  defined  hy 

var  u  ■  ave  (u^)  -  (ave  u)^  I 

when  clarity  or  precision  require  indication  of  the  dlstrihutlon  from,  which  the 
samples  are  drawn, 

ave^^C  )  and  ^ 

^11  refer  to  averages  and  variances  based  on  the  standard  normal  distribution 
(average  zero  and  variance  units),  while 

avej^C  )  and  vaTj^C  ) 

will  refer  to  averages  and  variances  based  on  the  standard  rectangular  distribution 
(on  the  Interval  0  <  p  <  l). 

The  quantities  of  most  Interest  to  us  will  be  denoted  as  follows,  suppressing 
dependences  on  n,  g,  and  the  particular  saaple; 


9 


y  =|(yi+y2^...  yjj)  -  untrluESfid  mean 
g  =  nxmiber  trimmed  from  each  end, 
h  a  n  -  2g  «  number  remaining. 


u 

y 

T 


— V  (y  ,+y  «  +  ..«+y  )*  trimmed  mean 

n-2g  '^^g+1  ^g-tS  'n-g' 

/  1^x2  ,  W.2  ,  i-'\2 

<Vi '  *  —  *  ‘W 


>'6+1  "■  >'e+a  *  •••  *  >'n-e  ' 


trimmed  sum  of  squares  of  deviations  a  TSSD. 


When  we  do  need  to  bring  In  dependence  on  n  and  g  ,  we  shall  often  do 
this  by  writing  g  +  h  -t-  g  as  an  argument,  such  cases  it  will  be  understood 
that  g  +  h  +  g  Is  the  original  san?ple  size  and  that  h  Is  the  trimmed  sainple 
size. 

We  shall  also  systematically  let  2  refer  to  summation  for  1  (or  J ) 
from  g  4-  1  to  n  -  g  (a  total  of  n  -  2g  «  h  values  of  l)  over  the  same 
range.  Giien 


u 

y 


1  * 
h^^i 


1  *  x2  *2 

^(2  y^) 


-  S  y. 


as  may  easily  be  verified. 


6.  Relation  to  order- statistic  moments. 

The  quantities  that  concern  us  most  can  be  expressed  in  terms  of  order- 
statistic  moments  as  follows; 
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u  ,u.2  ,  M.2  /<Jx2 

var  y  «  ave(y}  -  (ave  y)  a  ave(y; 


-  ^2  ave(y^yj )  , 

^  O  1  # 

ave  T  a  £  ave(yj  )  -  ^  £  £  ave  (y^Yj)* 


Mv  (g  th+s)  -  ^  (yi  )  -h. 

E*are  (y^y^) 


A^aln  ve  vrlte  W-Vji  (s  +  8)°^  I>iVj^(8  +li+  g)  wben  needed  for  clarity  or 

precision. 


7,  Normal  distributions. 

For  the  special  ease  of  sasgpllng  from  a  atandaz^  normal  population,  we 
can  refer  to  Telchroew  t3l  for  the  valties  of  avCjjCy^^),  ave^^Cy^  )  and 
for  n  <  20.  (B»o  corresponding  variances  and  covariances  are  given  by  Sarhan 
axid  Greenberg  a  few  pages  later  [2].  ) 

_ 

nius  normal- theory  variances  of  y's  and  normal  theory  averages  of  TSSD's 
SUM  easily  available  for  normal  saoples  of  size  no  more  than  twenty.  For  example, 
the  case  of  17  ■  6  +  5  +  6,  where  a  eanple  of  17  Is  trimmed  to  the  central 
five  observations,  yields 

£*£*  ave  (y^)  «  1.92257699 

£*  ave  (y^*)  •  .674220047 

whence 

varjj  y  »  ^•92|576^  .  .076905080 
Dlv  (6  +  5  ♦  6)  .  -  5  -  5.7671597  - 
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Notice  that  If  we  had  had  an  Initial  of  5#  and  had  not  trimmed  It, 

the  correct  divisor  wotold  have  been 

Dlv  (0  +  5  +  0)  =  20  , 

Qius  we  must  treat  the  sum  of  squares  of  deviations  fiom  a  trimmed  sample  quite 
differently  fisam  the  sum  of  squares  of  deviations  from  an  untrlmmed  sample.  Ohis 
Is  en^haslzed  hy  Table  3,  which  gives  values  of  the  ratio 

DiVjj(0  +  h  +  O) 

DiVjj  (g  +  h  +  g 

of  the  divisors  ^Ich  are  appropriate  on  nomal  theory  In  the  two  cases. 
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TABLE  5 

DlVjj(0  +  h  ♦  0) 

DiVj/g  -i-  hT-Bl 
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Tims  changes  in  this  factor  with  shape  deserve  ej^lorstlon.  For  this  purpose, 
exploration  of  shapes  far  more  extreme  than  are  likely  to  arise  In  practice  Is 

reasonable  since  the  aim  is  to  discover  magnitude  of  dependence  rather  than 
to  balance  losses. 

For  this  purpose,  the  accessibility  of  order  statistic  moments  for 
rectangular  distributions  is.  convenient  and  useful,  since  the  rather  extreme 
shape  of  the  rectangular  distribution  is  not  a  handicap. 

It  is  shown  in  Section  11  that,  for  a  rectangular  distribution  of  undt 
length  (which  will  serve  us  as  well  as  any  other  as  a  standard  rectangular  dis 
trlbution)  that  if  g  values  are  deleted  from  each  tall  of  a  sample  size 
n  =  h  +  2g»g+h  +  g,  leaving  h  central  values  for  the  computation  of  the 
trimmed  mean  and  the  trimmed  sum  of  squares  of  deviations,  then: 

rectangular  variance  of  trimmed  mean  » 

^  liiCn’TTT  ^ 


average  of  trimmed  sum  of  squares  of  deviations  ■ 


ave^T  «= 


(h+2)  (h+1)  (h-1) 
12  (n+2;  U+l) 


reciprocal  of  divisor  for  conversion 

1  3(n  +  l)  _  ^ 

Div„(g  +  h  +  g)  “  (h+2)  (h+1)  (h-l)  (h  +  2)h 

Mioltiplicatlon  of  the  values  already  obtained  for  the  normal- theory 
conversion- divisor  conversion  by  the  reciprocal  of  the  rectangular- theory 
conversion-divisor  yields  the  values  of  ratios  of  divisors  set  forth  in  liable  4. 
It  is  clear  from  this  table  that  the  normal  theory  conversion- divisor  is  in  any 
case  approximately  equal  to  the  rectangular- theory  conversion-divisor,  and,  as 
would  be  expected,  the  approximation  becomes  better  as  the  amount  of  trimming 


increases. 


T&lile  4 


1.0092 

1.0025 

.99470 

.98733 

.98079 

.97513 

.97028 

.96615 

.96263 

.95963 

.95708 

.95490 

.95304 

.95145 

.95010 

.94894 

.94795 


1.0071 

1.0021 

.99570 

.98908 

.98271 

.97682 

.97146 

.96662 

.96227 

.95838 

.95J»91 

.95180 

.94903 

.94653 

.94431 


1.0053 

1.0017 

.99678 

.991^2 

.98600 

.S^4 

.97575 

.97109 

.96676 

.96275 

.95907 

.95568 

.95256 


L.OOI5 

.99756 

.99325 

.98873 

.98419 

.97977 

.97551 

.97146 

.96763 

.96403 


1.0031  1.0024  1.0020  1.0017  1.0014 
1.0011  1.0009  1.0007  1.0006 
.99811  .99830  .99878  .99899 
.99460  .99560  .99637 
.99082  .99241  .99363 

.98693  .98907 
.98304  ,98566 
.97923 
.97554 
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9.  Futiure  work. 

Besides  the  questions  of  (l)  relative  efficiency  for  reasonable  population 
shapes,  (ll)  allowance  for  selection  bias  when  the  amount  of  trimming  Is  allowed 
to  vary  from  sample  to  sample,  and  (ill)  Improvement  from  an  unblased-estlmate-of- 
varlance  procedure  to  a  confidence  procedure,  all  of  which  are  very  important  to 
the  practical  use  of  **trimraBd*'  techniques,  the  considerations  of  later  sections 
abt'ut  the  rectangialar  case  maJse  It  clear  that  normal  theory  and  rectaisgtilar 
theory  can  be  usefully  compared  for  other  sorts  of  "trimmed"  procedures.  Ihe 
mid-range  (mean  of  highest  and  lowest  valtiss)  of  the  trimmed  sample  needs  to 
be  considered  as  an  Indicator  of  location.  It  Is,  of  course,  an  Inner  (or  quasi-) 
midrange  of  the  entire  sample.  For  both  trimmed  means  and  Inner  midrajoges  it  Is 
appropriate  to  consider  at  least  the  following  as  bases  for  estimating  variability: 

(a)  sum  of  sq'uaree  of  deviations  for  the  same  trimmed  sample, 

(b)  square  of  the  reuige  of  the  same  trimmed  saitple, 

(c)  sum  of  squares  of  deviations  for  s  less  vlgoroxMly  trimmed  saaple 

(d)  square  of  the  range  of  a  less  vigorously  trimmed  sample. 

It  Is  hoped  to  report  on  some  of  these  shortly. 
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II. 

DERIVATIONS,  DISCUSSIONS,  DETAILS 
10.  Etag)lrleal  'behavior  ot  Dlv  (g  4-  h  e)» 

When  the  normal- theory  hehavior  of  Div  (g  +  h  +  g)  was  examined,  it  was 
noticed  that,  for  h  fixed  and  g  changing  the  first  differences  of  its 
reciprocal  decreased  someidiat  for  h  >  3  #  Increased  slightly  for  h  •  2,  3, 
hut  in  both  eases  rapidly  became  constant  as  g  increased.  This  is  illustrated, 
for  two  values  of  h  ,  in  Table  5,  This  observation  imniediately  makes  it 
possible  to  extend  the  tables  of  divisors  leyond  total  sample  size  20  by 
es^lrlcal  extrapolation.  Such  a  process  could  be  used  to  quite  good  effect 
withotrt  fiorther  support.  However,  its  use  will  be  simpler,  and  somewhat  more 
precise,  if  it  can  borrow  support  from  algebraic  considerations  which  apply 
either  to  some  other  distribution  shape  or  in  some  asymptotic  sense.  Results 
for  the  rectangular  case  are  easily  obtained,  and  may  be  shown  to  hold  asymp¬ 
totically  for  all  distributions  smooth  at  the  median. 
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mable  5 

ILLU5TEIATI0N  OP  APPROACH  OP  G-WISE  DIPPERENCES  OP  RECIPROCALS 

OF  DIVISORS 


1 

h  a  2 

1 

h  a  5 

g 

+  2  +  g) 

8 

g 

8(6  ) 

'■  g' 

Divid^g  +  5  +  g) 

0 

.50000 

.49088 

.05000 

.072277 

1 

.99088 

.49851 

+.00763 

.12277 

.03573 

2 

1.48939 

.50016 

+.00165 

.15850 

.03567 

3 

1.98955 

.50059 

+.00045 

.19417 

.03564 

4 

2. 49014 

.50067 

+.00008 

.22981 

.03564 

5 

2. 99081 

.50064 

-.00003 

.26545 

.05564 

6 

3. 49145 

.50058 

-.00006 

.30109 

7 

3.99205 

.50053 

-.00005 

8 

4.49256 

-.00006 

.50047 

9  4.99505 


5(8g) 

-.03704 

-.00006 

-.00005 


18 


11,  T**?  rectangular  case. 

Hie  distributions  of  order  statistics  of  sangples  from  the  standard  rec¬ 
tangular  distribution  are  well-known  [1]  as  sure  esqiressions  for  their  moments. 

If  p.  and  p,  are  the  ith  and  Jth  order  statistics  of  a  ean5>le  of  n  from  the 
1  J 

standard  rectangular,  where  1  <  J#  then 


2  2 

ave  (pj  -  p^)  =  (ave  -  ave  y^)  +  var  y^ 


=  (  J - i_)2  + 

'n+l  n^-' 


(n+1)  (n42) 


-  2cov  (yj,  y^)  +  var  y^ 
(j  (n-J+l)  -  2i(n-J+l)  + 


i  (n-i+l)) 


/  \2 

This  is  a  function  of  n  and  J-i  alone,  and  hence  equal  to  ave  (Pj_j)  , 

as  would  be  expected  from  the  symmetric  distribution  of  equivalent  blocks  [4], 

2 

What  is  important  next  is  that  (n+l)(n+2)ave(pj-p^)  depends  only  on  J-i, 

As  i  and  J  run  over  any  h  consecutive  indices  of  a  sample  of  n  ,  the 

values  of  J-i  are  exactly  the  same, and  occur  with  the  same  multiplicity,  as  if 

i  and  J  ran  through  a  san^ile  of  size  h  .  Consequently 

(n+l)(n+2)  Z:*2*ave  (p,  -  p.  =  (h+l)(h+2)  Z  Z  ave  (p,  -  where  is 

i  J  J  1  J 

over  some  h  consecutive  values  of  a  san5)le  of  n  and  Z  is  over  all  values 
from  1  to  h  of  a  sample  of  h  , 

Let  now  <  P2  <  Pj  <  •••  <  order  statistics  of  a  sample  of 

^  *K'  ^  ^ 

h  ( not  n)  from  the  standard  rectangular,  and  let  p^  <  <  p ^  <  •  •  •  5 

be  the  order  statistics  of  an  independent  sample  of  n  (not  h)  from  the  same 
distribution.  Let  T(p)  and  T(p  )  be  the  corresponding  a?SSD*s,  in  the  first 
case  for  all  h  values  and  in  the  second  case  for  the  centjral  h  values.  Since 


2h  •  T(p  )  =  Z  Z  (pj  -  p^) 
2h  •  T(p)  =  Z  Z  (pj  -  p^)^ 
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we  must  have 

T(p*) 


55  avej  Z  E  (pj  -  Pj  ) 

acaiKSfe)  '‘''eR(»H-i)(M2)(pj*  -  Pi*)® 
iMntl)(n*S)  ==  ^  av.g(hll)(hl2)(Pj  -  p^)= 


(h-I)(hU)(b^) 

12(n+l)(n+2) 


for,  since  T(p)  is  an  vmtrlmmed  etm  of  sqiiares  for  a  saa^jle  of  size  h  , 
2 

ave  T(p)  *s  (h-  l)a  for  any  distrifoution. 

u 

Now  let  us  ttora  to  var  p  .  Recsdl  that,  for  i  <  J 

(n+l)^(n+2)  cov(p^,Pj)  a  i(n+l-j)  =  (  +  c)(  -  d) 

„  (  ^  )2  .  (  ^  )  (d  -  c)  -  cd 

where  21  a  (n+l)  +  2c,  2j  a  (n+l)  +2d  ,  so  that  c  and  d  range  over  h 
values  with  average  zero  and  unit  spacing  hetvreen  adjacent  '/alues.  Hence 

(n+l)^(n+2)  zVcovCp  ,p  )  a  h^  (  ^  )^  -  2  ^  Z*  (h  -  [d-cDId-cj 
1  j  ^  Id-c| 

since  cd  a  (2*c)  (Z*d)  =  0«0  a  0  . 

Now 

Z  (h-k)  •  kahZk-Zk^a  ^  M  -  h(ti+l)(2h+l) 

kal 

-  (jh  -  2h  -  1)  a 

so  that 

var  p  a  — gZ  z.  cov(Pj,p^) 

"  l^CnW~  ^ 


h^(n+l)^(n+2) 


[h=  <^)‘ 


_  n+l  /h+1 
"2^5 


)] 


h^  -1 
3h(n+l) 
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(for  h  »  1,  this  cliecks  vlth  the  variance  of  the  median  p,  namely  l/4(n+2), 
while  for  h  «  n  It  reduces  to  l/l2n  ,  as  It  should, 

Hie  conversion  divisor  Is  thus 


DlVj^  (g  +  li  +  g) 


(h-t-2)(h4-l)(h-l) 

12(n+2;(n+i; 


5h(n+l)  -  2(h^  -1) 

l2h(n+lHn+2) 


(h+2)(h+l)(h)(h-l) 
=  5h(n+l)  -  2{h2-i) 


(h.t2)(h4-l)(h)(h-l) 

^•n  -  (2h‘^  -jh-l) 


which  reduces  to  (n)(n-l)  when  hs*n,  as  It  should.  Its  reciprocal  can  he 
written 


1 _  5(11+1)  2 

Dlv^(g  +  h  +  g)  “  (h+2}(h+l)(h-l)  "  (h+2)(h) 

which  Is  obviously  linear  In  n  , 

If  we  fix  h  f  and  let  g  increase  unit  by  unit,  n  will  incirease  in 
steps  of  2,  and  the  rectangular  theory  reciprocal  will  increase  in  steps  of 

6 

(h+2)(h+l)(h-l) 

12,  Hie  asyBg)totlc  case. 

Consider  next  the  case  of  an  arbitrary  distribution  where  h  is  fixed  and 
n  is  large.  If  y  =  r(p)  is  the  representing  function  of  the  distribution, 
so  that  F(r(p))  =  p  where  F(y)  is  the  coirre spending  cumulative,  then 

y^  =»  r(pj^) 

where  y2_,y2#  •••t  'the  order  statistics  of  a  sample  of  n  y's  and 

^1*  ^2*  order  statistics  of  the  corresponding  sample  of  n  p's. 
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Ptrt  q'  «  p  and  q"  =  p  i  ^  so  that  the  h  central  p.  fall  between 
q'  wwri  q".  It  Is  a  consequence  of  Wald's  principle  [4]  that,  conditional 
upon  the  valties  of  q'  and  q"  ,  these  h  p^'s  are  distributed  like  a  saaiple 
of  h  from  the  rectangular  distribution  with  q'  and  q"  as  end  point. 
Conditional  on  q'  and  q"  we  have  the  following  averages  and  variances; 


ave  (p  1 

q' »  q" )  “  5  ( q' 

var  (p  1 

1  ..  .Its  (q"  ”  q 

I  q  »  q  )  -  ipb 

ave  (T(p)  I  q',  q")  =  l 

•whence 


varj^  p 


s:  ave  "vrar  (p  j  q'>q")  +  var  ave  (p|  q’,q") 
S  -r—  ave  (q"  -  q'  )^  =  +  |  var  (q"  +  q'  ) 


ave^^  T(p)  =  ^  ave  (q"  -  q' 

so  that  the  reciprocal  of  the  conversion  factor  satisfies 

1  ^  -var  p  /'i.  var  (q"  +  q')  ^ 

Div  (g  +  h  +  g)  =  Kk-l)  ave  T(p)  =  ^  ^  ave  -  q' 

If  now  z'  =  y  and  z"  »  y  -  ,  so  that  the  h  central  y^^  fall 

g  n+x— g 

between  z'  and  z"  ,  it  again  follows  from  Wald's  principle  that,  conditional 
onthe  "values  of  z'  and  z",  these  y^  are  distributed  like  a  sanple  fium 
whatever  may  be  the  distribution  of  y  truncated  onto  the  Interval  from  z' 
to  z". 


22. 


If  n  is  very  Istrge,  tha  Internal  from  i'  to  r"  vlU  to  short  and  vlOl 
lie  close  to  the  median  of  the  dlatilhutlon  of  y  .  H  that  distribution  Is 
smooth  near  Its  smdlsn,  the  result  of  truncating  It  onto  any  snail  Internal  near 
the  median  will  he  very  nearly  a  rectangular  distribution. 

Hence 


1 _ 

DlVjj^g+  h  +  g) 


h(h  -  l) 


u 


VBT  y 

"T(y7 


«  1  +  6h 


\-ar  (y”  +  y' )_ 
av-e  (y"  -  y'  r 


where  D  stands  for  any  distribution  smooth  near  the  median,  and  T(y)  is  the 
TSSD  for  the  y’s.  Moreover, 


z'  a  r(q.')  and  z"  =  r(q") 


where  r  will  hehave  very  nearly  linearly,  so  that 


var  (y"  +  y* )  „ 

ave(y"  -  y'f  ave  (q"  -  q*  ) 


consequently 


h(h-l) 


u 

var  y 
ave  T(y) 


w  1  +  6h 


var 

ave 


b(h-l) 


u 

var  p 
ave  ■t(p7 


and  we  see  that  asymptotically,  for  fixed  h  and  very  large  n  ,  the  value 
of  the  conversion  factor  will  not  depend  upon  the  shape  of  the  parent  distri¬ 
bution,  so  long  as  that  distribution  is  smooth  near  the  median. 

If  the  distribution  of  y  is  symmetric,  -then 

r(p)  =  a  +  b(p  -  a)  +  +  ..» 

and  deviations  from  linearity  arc  of  order  (p  -  if  times  the  linear  deviations 
illnce  (p  -  if  is  of  order  1/n  ,  the  fractional  deviations  of  the  co-werslon 
factor  for  any  two  symmetrical  distributions  from  one  another  are  at  most  of 


25. 


order  l/n  for  each  fixed  h  . 

Suppose  that,  for  two  symmetrical  parent  dlstrlhTitlons,  the  conversion 
factors  for  some  h  satisfy: 

factor”^  «=  .  (n  +  l)  +  +  C^(n) 
factor”^  ■  Ag  .  (n  +  l)  +  Bg  +  CgCn) 

where  C  (n)  and  CgCn)  both  tend  to  zero  as  n  Increases.  Olielr  ratio  can 
only  approach  unity  as  n  tends  to  Infinity  If  *■  ^2*  standard 

rectangTxLar  distribution 

“  (h4g)(h+i)(b-i7  ■  (h4e)h"  * 

Consequently,  for  any  synmetrlcal  distribution  for  which  the  general  form 
applies , 

-  (ht2)(h!l)(h-I) 

Where  C(n)  tends  to  zero,  while  the  difference  between  the  reciprocals  of 
the  factor  for  n  and  n-1  will  be 

6(factor“^)  =  (h+2)(h+l)(h-l7 
15.  Sijggested  alternatives. 

The  discussion  of  the  last  paragraph  shows,  upon  reexamination,  tliat  the 
reason  why  the  conversion  factor  does  not  depend  upon  h  alone  lies  In  the 
ratio 

var  [average  of  distribution  trlnmed  to  (z*,  z'')l  , 

ave  [variance  of  distribution  trlnined  to  (z',  z")] 


Thtis  It  appears  that  perhaps  the  most  natuial  way  to  biilld  In  some  compensation 


2k. 


Is  to  \ise  as  a  "basis  for  estimating  the  trirmad  variance  of  the  mean  of  h 
valiaes,  not  the  TSSD  or  squared  range  of  the  same  h  central  values,  "but  the 
TSSD  or  squared  range  of  a  greater  number  of  values,  perhaps  1  +  h  +  1  or 
2  +  h  +  2  or  3  +  h  +  3  .  Ihese  possibilities  will  be  considered  nianerically 

in  a  later  report. 
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